Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize @ inbounds for x in A ... #13866

Merged
merged 1 commit into from
Nov 4, 2015
Merged

Vectorize @ inbounds for x in A ... #13866

merged 1 commit into from
Nov 4, 2015

Conversation

simonster
Copy link
Member

This would previously have been an infinite loop if length(A) == typemax(Int) so the loop vectorizer couldn't compute a trip count. Ref #13860 (comment)

This would previously have been an infinite loop if
`length(A) == typemax(Int)` so the loop vectorizer couldn't compute a
trip count.
@simonster
Copy link
Member Author

Actually, maybe we should we start i at zero here instead of one and adjust everything else to match? This doesn't seem to make much of a difference if the loop gets vectorized, but I get a ~10% perf boost on count1 from #13860 (comment) without @inbounds. With i starting at zero, on LLVM 3.3, ASM is:

L31:    cmpq    %r8, %rcx
        jae     L82
        movq    (%rdi), %rdx
Source line: 4
Source line: [inline] float.jl:269
        subq    %rsi, %rdx
        vcmpneqsd       (%rdx), %xmm0, %xmm1
        vmovd   %xmm1, %edx
        andl    $1, %edx
        addq    $-8, %rsi
Source line: 3
        incq    %rcx
Source line: 4
Source line: [inline] float.jl:269
        addq    %rdx, %rax
        cmpq    %rcx, %r8
        jne     L31

vs.

L37:    cmpq    %r8, %rcx
        jae     L95
Source line: 4
Source line: [inline] float.jl:269
        leaq    (,%rsi,8), %r10
Source line: 3
        movq    (%rdi), %rdx
Source line: 4
Source line: [inline] float.jl:269
        subq    %r10, %rdx
        vcmpneqsd       (%rdx), %xmm0, %xmm1
        vmovd   %xmm1, %edx
        andl    $1, %edx
        incq    %rcx
        decq    %rsi
        addq    %rdx, %rax
        cmpq    %rsi, %r9
        jne     L37

OTOH, LLVM 3.6 appears to be smarter and this doesn't make a difference there, so maybe this isn't worth it?

@timholy
Copy link
Member

timholy commented Nov 4, 2015

Possibly related to #9182.

simonster added a commit that referenced this pull request Nov 4, 2015
Vectorize `@ inbounds for x in A ...`
@simonster simonster merged commit 1d270c6 into master Nov 4, 2015
@simonster simonster deleted the sjk/array-vectorize branch November 4, 2015 23:24
@nalimilan
Copy link
Member

This would deserve a comment explaining why the function isn't written in the most natural way. Especially since this isn't covered by the tests, which means anybody might break this by rewriting it to an apparently better form.

simonster added a commit that referenced this pull request Nov 7, 2015
This would previously have been an infinite loop if
`length(A) == typemax(Int)` so the loop vectorizer couldn't compute a
trip count.

(cherry picked from commit fa89a6e)
ref #13866
mbauman added a commit that referenced this pull request May 11, 2018
Currently, if a vector is resized in the midst of iteration, then `done` might "miss" the end of iteration. This trivially changes the definition to catch such a case. I am not sure what guarantees we make about mutating iterables during iteration, but this seems simple and easy to support.  Note, though, that it is somewhat tricky: until #13866 we used `i > length(a)`, but that foils vectorization due to the `typemax` case. This definition seems to get the best of both worlds. For a definition like `f` below, this new definition just requires one extra `add i64` operation in the preamble (before the loop). Everything else is identical to master.

```julia
function f(A)
    r = 0
    @inbounds for x in A
        r += x
    end
    r
end
```
mbauman added a commit that referenced this pull request May 14, 2018
Currently, if a vector is resized in the midst of iteration, then `done` might "miss" the end of iteration. This trivially changes the definition to catch such a case. I am not sure what guarantees we make about mutating iterables during iteration, but this seems simple and easy to support.  Note, though, that it is somewhat tricky: until #13866 we used `i > length(a)`, but that foils vectorization due to the `typemax` case. This definition seems to get the best of both worlds. For a definition like `f` below, this new definition just requires one extra `add i64` operation in the preamble (before the loop). Everything else is identical to master.

```julia
function f(A)
    r = 0
    @inbounds for x in A
        r += x
    end
    r
end
```
mbauman added a commit that referenced this pull request May 15, 2018
* More robust iteration over Vectors

Currently, if a vector is resized in the midst of iteration, then `done` might "miss" the end of iteration. This trivially changes the definition to catch such a case. I am not sure what guarantees we make about mutating iterables during iteration, but this seems simple and easy to support.  Note, though, that it is somewhat tricky: until #13866 we used `i > length(a)`, but that foils vectorization due to the `typemax` case. This definition seems to get the best of both worlds. For a definition like `f` below, this new definition just requires one extra `add i64` operation in the preamble (before the loop). Everything else is identical to master.

```julia
function f(A)
    r = 0
    @inbounds for x in A
        r += x
    end
    r
end
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants